Beating Atari with Natural Language Guided Reinforcement Learning
نویسندگان
چکیده
We introduce the first deep reinforcement learning agent that learns to beat Atari games with the aid of natural language instructions. The agent uses a multimodal embedding between environment observations and natural language to self-monitor progress through a list of English instructions, granting itself additional reward for completing instructions in addition to increasing the game score. Our agent significantly outperforms Deep-Q Networks, Asynchronous Advantage Actor-Critic (A3C) agents, and the best agents posted to OpenAI Gym [4] on what is often considered the hardest Atari 2600 environment [2]: MONTEZUMA’S REVENGE. Videos of Trained MONTEZUMA’S REVENGE Agents: Our Best Current Model. Score 3500. Best Model Currently on OpenAI Gym. Score 2500. Standard A3C Agent Fails to Learn. Score 0. Figure 1: Left: an agent exploring the first room of MONTEZUMA’S REVENGE. Right: an example list of natural language instructions one might give the agent. The agent grants itself an additional reward after completing the current instruction. “Completion” is learned by training a generalized multimodal embedding between game images and text.
منابع مشابه
Learning to Perform Physics Experiments via Deep Reinforcement Learning
When encountering novel objects, humans are able to infer a wide range of physical properties such as mass, friction and deformability by interacting with them in a goal driven way. This process of active interaction is in the same spirit as a scientist performing experiments to discover hidden facts. Recent advances in artificial intelligence have yielded machines that can achieve superhuman p...
متن کاملAn Empirical Analysis of Proximal Policy Optimization with Kronecker-factored Natural Gradients
Deep reinforcement learning methods have shown tremendous success in a large variety tasks, such as Go [Silver et al., 2016], Atari [Mnih et al., 2013], and continuous control [Lillicrap et al., 2015, Schulman et al., 2015]. Policy gradient methods [Williams, 1992] is an important family of methods in model-free reinforcement learning, and the current state-of-the-art policy gradient methods ar...
متن کاملBeating the World's Best at Super Smash Bros. with Deep Reinforcement Learning
There has been a recent explosion in the capabilities of game-playing artificial intelligence. Many classes of RL tasks, from Atari games to motor control to board games, are now solvable by fairly generic algorithms, based on deep learning, that learn to play from experience with minimal knowledge of the specific domain of interest. In this work, we will investigate the performance of these me...
متن کاملDeep Reinforcement Learning With Macro-Actions
Deep reinforcement learning has been shown to be a powerful framework for learning policies from complex high-dimensional sensory inputs to actions in complex tasks, such as the Atari domain. In this paper, we explore output representation modeling in the form of temporal abstraction to improve convergence and reliability of deep reinforcement learning approaches. We concentrate on macro-action...
متن کاملAtari Games and Intel Processors
The asynchronous nature of the state-of-the-art reinforcement learning algorithms such as the Asynchronous Advantage ActorCritic algorithm, makes them exceptionally suitable for CPU computations. However, given the fact that deep reinforcement learning often deals with interpreting visual information, a large part of the train and inference time is spent performing convolutions. In this work we...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1704.05539 شماره
صفحات -
تاریخ انتشار 2017